wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web

نویسندگان

  • Andy Way
  • Nano Gough
چکیده

We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the system’s memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the PennII Treebank. In subsequent stages, the 〈source, target〉 translation pairs obtained are automatically transformed into a series of resources that render the translation process more successful. Despite the fact that the output from on-line MT systems is often faulty, we demonstrate in a number of experiments that when used to seed the memories of an EBMT system, they can in fact prove useful in generating translations of high quality in a robust fashion. In addition, we demonstrate the relative gain of EBMT in comparison to on-line systems. Second, despite the perception that the documents available on the Web are of questionable quality, we demonstrate in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by our system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The World Wide Web as a Resource for Example-Based Machine Translation Tasks

, The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web text presents language as it is used, and statistics derived from the Web can have practical uses in many NLP applications. For this reason, the WWW should be seen and studied as any other computationally available linguistic resource. In this article, we illustrate this by showing that an Example-Based ap...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Saffron a Prototype Example for Evidence Based Herbal Medicine

Evidence-based medicine is now generally perceived to be the dominant operating system in conventional medicine. Evidence-based medicine developed concurrently with the internet and the world wide web. This is no coincidence since evidence-based medicine suggests a personal responsibility for clinicians to keep abreast of research that would be difficult without the information access that the ...

متن کامل

Mining Chinese-English Parallel Corpora from the Web

Parallel corpora are a crucial resource in research fields such as cross-lingual information retrieval and statistical machine translation, but only a few parallel corpora with high quality are publicly available nowadays. In this paper, we try to solve the problem by developing a system that can automatically mine high quality parallel corpora from the World Wide Web. The system contains a thr...

متن کامل

Towards Extended Machine Translation Model for Next Generation World Wide Web

In this paper, we proposed a Data Translation model which potentially is a major promising web service of the next generation world wide web. This technique is somehow analogy to the technique of traditional machine translation but it is far beyond what we understand about machine translation in the past and nowadays in terms of the scope and the contents. To illustrate the new concept of web s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2003